Search CORE

110 research outputs found

Vectors of Locally Aggregated Centers for Compact Video Representation

Author: Abbas Alhabib
Andreopoulos Yiannis
Deligiannis Nikos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2015
Field of study

We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME 2015, Torino, Ital

arXiv.org e-Print Archive

Crossref

Leaping Into Memories: Space-Time Deep Feature Synthesis

Author: Deligiannis Nikos
Stergiou Alexandros
Publication venue
Publication date: 29/03/2023
Field of study

The success of deep learning models has led to their adaptation and adoption by prominent video understanding methods. The majority of these approaches encode features in a joint space-time modality for which the inner workings and learned representations are difficult to visually interpret. We propose LEArned Preconscious Synthesis (LEAPS), an architecture-agnostic method for synthesizing videos from the internal spatiotemporal representations of models. Using a stimulus video and a target class, we prime a fixed space-time model and iteratively optimize a video initialized with random noise. We incorporate additional regularizers to improve the feature diversity of the synthesized videos as well as the cross-frame temporal coherence of motions. We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of spatiotemporal convolutional and attention-based architectures trained on Kinetics-400, which to the best of our knowledge has not been previously accomplished

arXiv.org e-Print Archive

Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

Author: Deligiannis Nikos
Sammani Fawaz
Publication venue
Publication date: 17/08/2023
Field of study

Natural Language Explanations (NLE) aim at supplementing the prediction of a model with human-friendly natural text. Existing NLE approaches involve training separate models for each downstream task. In this work, we propose Uni-NLX, a unified framework that consolidates all NLE tasks into a single and compact multi-task model using a unified training objective of text generation. Additionally, we introduce two new NLE datasets: 1) ImageNetX, a dataset of 144K samples for explaining ImageNet categories, and 2) VQA-ParaX, a dataset of 123K samples for explaining the task of Visual Question Answering (VQA). Both datasets are derived leveraging large language models (LLMs). By training on the 1M combined NLE samples, our single unified framework is capable of simultaneously performing seven NLE tasks including VQA, visual recognition and visual reasoning tasks with 7X fewer parameters, demonstrating comparable performance to the independent task-specific models in previous approaches, and in certain tasks even outperforming them. Code is at https://github.com/fawazsammani/uni-nlxComment: Accepted to ICCVW 202

arXiv.org e-Print Archive

Fast Desynchronization For Decentralized Multichannel Medium Access Control

Author: Andreopoulos Yiannis
Deligiannis Nikos
Mota Joao F. C.
Smart George
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/07/2015
Field of study

Distributed desynchronization algorithms are key to wireless sensor networks as they allow for medium access control in a decentralized manner. In this paper, we view desynchronization primitives as iterative methods that solve optimization problems. In particular, by formalizing a well established desynchronization algorithm as a gradient descent method, we establish novel upper bounds on the number of iterations required to reach convergence. Moreover, by using Nesterov's accelerated gradient method, we propose a novel desynchronization primitive that provides for faster convergence to the steady state. Importantly, we propose a novel algorithm that leads to decentralized time-synchronous multichannel TDMA coordination by formulating this task as an optimization problem. Our simulations and experiments on a densely-connected IEEE 802.15.4-based wireless sensor network demonstrate that our scheme provides for faster convergence to the steady state, robustness to hidden nodes, higher network throughput and comparable power dissipation with respect to the recently standardized IEEE 802.15.4e-2012 time-synchronized channel hopping (TSCH) scheme.Comment: to appear in IEEE Transactions on Communication

arXiv.org e-Print Archive

Heriot Watt Pure

UCL Discovery

Convergence of Desynchronization Primitives in Wireless Sensor Networks: A Stochastic Modeling Approach

Author: Andreopoulos Yiannis
Buranapanichkit Dujdow
Deligiannis Nikos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/11/2014
Field of study

Desynchronization approaches in wireless sensor networks converge to time-division multiple access (TDMA) of the shared medium without requiring clock synchronization amongst the wireless sensors, or indeed the presence of a central (coordinator) node. All such methods are based on the principle of reactive listening of periodic "fire" or "pulse" broadcasts: each node updates the time of its fire message broadcasts based on received fire messages from some of the remaining nodes sharing the given spectrum. In this paper, we present a novel framework to estimate the required iterations for convergence to fair TDMA scheduling. Our estimates are fundamentally different from previous conjectures or bounds found in the literature as, for the first time, convergence to TDMA is defined in a stochastic sense. Our analytic results apply to the Desync algorithm and to pulse-coupled oscillator algorithms with inhibitory coupling. The experimental evaluation via iMote2 TinyOS nodes (based on the IEEE 802.15.4 standard) as well as via computer simulations demonstrates that, for the vast majority of settings, our stochastic model is within one standard deviation from the experimentally-observed convergence iterations. The proposed estimates are thus shown to characterize the desynchronization convergence iterations significantly better than existing conjectures or bounds. Therefore, they contribute towards the analytic understanding of how a desynchronization-based system is expected to evolve from random initial conditions to the desynchronized steady state.Comment: to appear, IEEE Transactions on Signal Processing, 201

arXiv.org e-Print Archive

UCL Discovery

Visualizing and Understanding Contrastive Learning

Author: Deligiannis Nikos
Joukovsky Boris
Sammani Fawaz
Publication venue
Publication date: 23/04/2023
Field of study

Contrastive learning has revolutionized the field of computer vision, learning rich representations from unlabeled data, which generalize well to diverse vision tasks. Consequently, it has become increasingly important to explain these approaches and understand their inner workings mechanisms. Given that contrastive models are trained with interdependent and interacting inputs and aim to learn invariance through data augmentation, the existing methods for explaining single-image systems (e.g., image classification models) are inadequate as they fail to account for these factors. Additionally, there is a lack of evaluation metrics designed to assess pairs of explanations, and no analytical studies have been conducted to investigate the effectiveness of different techniques used to explaining contrastive learning. In this work, we design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images. We further adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations and evaluate our proposed methods with these metrics. Finally, we present a thorough analysis of visual explainability methods for contrastive learning, establish their correlation with downstream tasks and demonstrate the potential of our approaches to investigate their merits and drawbacks

arXiv.org e-Print Archive